Picture for Bochuan Cao

Bochuan Cao

SkillGrad: Optimizing Agent Skills Like Gradient Descent

Add code
May 26, 2026
Viaarxiv icon

Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models

Add code
Nov 06, 2025
Viaarxiv icon

On the Convergence of Moral Self-Correction in Large Language Models

Add code
Oct 08, 2025
Figure 1 for On the Convergence of Moral Self-Correction in Large Language Models
Figure 2 for On the Convergence of Moral Self-Correction in Large Language Models
Figure 3 for On the Convergence of Moral Self-Correction in Large Language Models
Figure 4 for On the Convergence of Moral Self-Correction in Large Language Models
Viaarxiv icon

Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation

Add code
Mar 05, 2025
Figure 1 for Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation
Figure 2 for Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation
Figure 3 for Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation
Figure 4 for Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation
Viaarxiv icon

TruthFlow: Truthful LLM Generation via Representation Flow Correction

Add code
Feb 06, 2025
Figure 1 for TruthFlow: Truthful LLM Generation via Representation Flow Correction
Figure 2 for TruthFlow: Truthful LLM Generation via Representation Flow Correction
Figure 3 for TruthFlow: Truthful LLM Generation via Representation Flow Correction
Figure 4 for TruthFlow: Truthful LLM Generation via Representation Flow Correction
Viaarxiv icon

Data Free Backdoor Attacks

Add code
Dec 09, 2024
Figure 1 for Data Free Backdoor Attacks
Figure 2 for Data Free Backdoor Attacks
Figure 3 for Data Free Backdoor Attacks
Figure 4 for Data Free Backdoor Attacks
Viaarxiv icon

AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models

Add code
Oct 28, 2024
Figure 1 for AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models
Figure 2 for AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models
Figure 3 for AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models
Figure 4 for AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models
Viaarxiv icon

On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

Add code
Jun 04, 2024
Figure 1 for On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept
Figure 2 for On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept
Figure 3 for On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept
Figure 4 for On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept
Viaarxiv icon

XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution

Add code
May 30, 2024
Figure 1 for XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution
Figure 2 for XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution
Figure 3 for XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution
Figure 4 for XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution
Viaarxiv icon

Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization

Add code
May 28, 2024
Figure 1 for Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Figure 2 for Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Figure 3 for Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Figure 4 for Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Viaarxiv icon